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(54) Entertainment apparatus and method for reflecting input voice in operation of cliaracter 



(57) A sound interval and a sound volume are ex- 
tracted from the voice of a player inputted through a mi- 
crophone, to grip changes In the sound interval and the 
sound volume in words. The difference between these 
data and reference data recorded in reference voice da- 
ta 303 is calculated, and the inputted words are evalu- 



ated on the basis of the difference. With respect to a 
character as an operating object of the player, the con- 
tents of an operation of the character are determined by 
the evaluation, and the character reacts in real time. 
Thus, a game is realized in which the character makes 
real-time reactions to voice inputs from the player. 
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Description 

Technical Field 

[0001] The present Invention relates to entertainment 5 
apparatus and method for reflecting a voice input from 
a player In the operation of a character. 

Background of the Invention 

10 

[0002] In a game played with an entertainment appa- 
ratus, etc., there are many cases in which a player gives 
commands to a player character, etc. as an operating 
object by using an input device such as a controller, a 
keyboard, etc. However, in recent years, there has ap- '5 
peared a game in which a player gives commands with 
a voice input device such as a microphone, etc. 
[0003] In such a game, for example, the contents of 
an input voice of the player are Judged with voice rec- 
ognition techniques such as analysis of a voice spec- 20 
trum, pattern matching with a standard pattern, etc. , and 
the character is made to take an action corresponding 
to the input voice of the player, to advance the game. 

Disclosure of the Invention 25 

[0004] However, it is a large burden on the device to 
recognize the voice, particularly, to interpret words of the 
player, and reflect its contents in the game, and it takes 
time in processing, which sometimes becomes a neck 30 
of a smooth advancement of the game. In particular, a 
big problem is caused when the voice input is applied 
to a game in which the character appearing in the game 
makes a real-time reaction to the voice of the player. 
[0005] Therefore, the game using the voice input is 35 
limited mainly to games giving no importance to a real- 
time property, in which the player and the character have 
a talk and the character is a bit slow in giving an answer 
to the voice input of the player or taking action. There is 
therefore caused a problem of deficiency in diversifica- 40 
tion. 

[0006] An object of the invention is to provide a game 
in which the character makes a real-time reaction to the 
voice input. 

[0007] For overcoming the above problems, the 45 
present Invention provides the following entertainment 
apparatus. Namely, it is an entertainment apparatus to 
which a voice input device for receiving a voice input 
from a player is connectable or provided, and which 
comprises character control means for controlling the so 
operation of a game character; sound interval extracting 
means for extracting Infonnation of a relative sound in- 
terval from the voice of the player received through said 
voice input device; and sound volume extracting means 
for extracting infonmation of a sound volume from the ss 
voice of the player received through said voice Input de- 
vice; wherein said character control means makes the 
character perform an operation on the basis of said ex- 
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tracted information of the relative sound interval and 
said extracted information of the sound volume. 
[0008] Since processing is performed by extracting 
the information of the sound volume and the sound in- 
terval from the player voice as described above, a game 
can be smoothly advanced without imposing an exces- 
sive burden on the entertainment apparatus. 
[0009] Further, this entertainment apparatus can fur- 
ther comprise guide display means for outputting con- 
tents of the voice to be inputted by the player. 
[0010] Further, there may further employ a constitu- 
tion in which the entertainment apparatus further com- 
prises reference voice data storage means for storing 
voice data as an evaluation reference about the relative 
sound interval and the sound volume with respect to the 
voice to be inputted by the player, and said character 
control means periodically compares said extracted in- 
fomiation of the relative sound interval and said extract- 
ed Information relative to the sound volume with the 
voice data as said evaluation reference, and detemiines 
operation contents of the character on the basis of re- 
sults of the comparison. 

[0011] Further, the operation of said character is 
shown by regenerating image data prepared in ad- 
vance, and said character control means can change a 
regenerating speed of said image data on the basis of 
a difference between timing for outputting contents of 
the voice to be inputted by said player and timing for 
starting the input of the voice by the player. 
[0012] Further, there can be employed a constitution 
in which said character control means compares said 
extracted information of to the relative sound interval 
and the voice data of the relative sound interval as said 
evaluation reference, to exaggerate an expression of 
the character as the extracted relative sound interval is 
higher than the relative sound interval as an evaluation 
reference, and to moderate the expression of the char- 
acter as the extracted relative sound interval is lower 
than the relative sound interval as an evaluation refer- 
ence as a result of this comparison, and said character 
control means compares said extracted information of 
the sound volume and the voice data of said sound vol- 
ume as an evaluation reference, to exaggerate a behav- 
ior of the character as the extracted sound volume is 
larger than the sound volume as the evaluation refer- 
ence, and to moderate the behavior of the character as 
the extracted sound volume is smaller than the sound 
volume as an evaluation reference. 

Brief Description of the Drawings 

[0013] 

Fig. 1 Is a block diagram for explaining the construc- 
tion of the voice input operating system in the 
present embodiment. 

Fig. 2 is a graph showing one example of changes 
in a sound interval and a sound volume when words 
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are inputted by a voice. 
Fig. 3 is a graph showing one example of the rela- 
tionship between a player voice and a reference 
voice with respect to the changes in the sound in- 
terval and the sound volume. 
Fig. 4 is a graph showing a difference between the 
player voice and the reference voice with respect to 
the changes in the sound interval and the sound vol- 
ume. 

Fig. 5 is a view showing a summary of the evalua- 
tion of an input voice evaluating function 201 3 when 
sound volume evaluation is used as an example. 
Figs. 6A to 6D are views showing an example of a 
change in operation of a character caused by a 
Change in parameter. 

Fig. 7 IS a flow chart for explaining a processing flow 
when words are received from a player. 
Fig. 8 is a block diagram for explaining the hardware 
construction of an entertainment apparatus 10. 
Fig. 9 is a view for explaining a use state of the en- 
tertainment apparatus 10. 

Best Mode for Carrying Out the Invention 

[0014] The embodiment modes of the present inven- 
tion will be explained in detail with reference to the draw- 
ings. 

[001 5] First, the hardware constitution of an entertain- 
ment apparatus 1 0 including a voice input operating sys- 
tem in an embodiment mode of the present invention 

will be explained with reference to a block diagram 
shown in Fig. 8. 

[0016] In this figure, the entertainment apparatus 10 
has a main CPU 100, a graphics processor (GP) 110, 
an I/O processor (lOP) 120, a CD/DVD reading section 
130, a sound processor unit (SPU) 140, a sound buffer 
141, an OS-ROM 150, a main memory 160, an lOP 
memory 170 and a USB interface 175. 
[0017] The main CPU 100 and the GP 110 are con- 
nected through an exclusive bus 101. The main CPU 
1 00 and the lOP 1 20 are connected through a bus 1 02. 
ThelOP 120, the CD/DVD reading section 130, the SPU 
1 40 and the OS-ROM 1 50 are connected to a bus 1 03. 
[001 8] A main memory 1 60 is connected to the main 
CPU 100, and an iOP memory 170 is connected to the 
lOP 120. Further, a controller 180 and a USB interface 
175 are connected to the IOP 120. 
[0019] The main CPU 100 execute a program stored 
in the OS-ROM 150, or a program transferred from a 
CD/DVD-ROM, etc. to the main memory 1 60, to perform 
predetemiined processing. 

[0020] The GP 11 0 Is a drawing processor for fulfil ling 
a rendering function, etc. of the present entertainment 
apparatus, and perfomis drawing processing in accord- 
ance with commands from the main CPU 100. 
[0021] The IOP 120 is a sub-processor for input-out- 
put for controlling transmission and reception of data be- 
tween the main CPU 100 and a peripheral device, e.g., 
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the CD/DVD reading section 130, the SPU 140, etc. 
[0022] The CD/DVD reading section 1 30 reads data 
from a CD-ROM and a DVD-ROM mounted on a CD/ 
DVD drive, and transfers these data to a buffer area 161 

5 arranged in the main memory 1 60. 

[0023] The SPU 1 40 regenerates compressed wave- 
form data, etc., stored in the sound buffer 141 at a pre- 
detemiined sampling frequency on the basis of pro- 
nouncing instructions from the main CPU 100, etc. 

10 [0024] The OS-ROM 150 is a non-volatile memory 
storing a program, etc. executed by the main CPU 100 
and the IOP 120 at a starting time. 
[0025] The main memory 1 60 is a main memory de- 
vice of the main CPU 100, and stores instructions exe- 

'5 cuted by the main CPU 100, data utilized by the main 
CPU 1 00, etc. Further, the main memory 1 60 is provided 
with the buffer area 1 61 for temporarily storing data read 
from a recording medium such as CD-ROM, DVD-ROM, 
etc. 

20 [0026] The IOP memory 170 is amain memory device 
of the IOP 120, and stores instructions executed by the 
IOP 120, data utilized by the main CPU 1 00, etc. 
[0027] The controller 1 80 is an interface for receiving 
commands from an operator. 

25 [0028] A USB microphone 1 7 is connected to the USB 
interface 1 75. When the voice of a player is inputted to 
the USB microphone 17, the USB microphone 17 per- 
forms fiJD conversion, etc., using a predetermined sam- 
pling frequency and a quantized bit number, and sends 

30 voice data to the USB interface 1 75. 

[0029] Fig. 9 is a view for explaining a use state of the 
entertainment apparatus 10. In this figure, the controller 
180 is connected to a connector portion 12 of an enter- 
tainment apparatus main body 1 1 . A cable 1 4 for an im- 

35 age voice output is connected to an image voice output 
terminal 13 of the entertainment apparatus main body 
11 . An image voice output device 15 of a television re- 
ceiver, etc., is connected to the other end of this cable 
14. An operator of the entertainment apparatus gives 

40 operation instructions with the controller 1 80. The en- 
tertainment apparatus 1 0 receives commands from the 
operator through the controller 180, and outputs image 
data and voice data corresponding to these commands 
to the image voice output device 15. The image voice 

^ output device 1 5 outputs an image and a voice. 

[0030] The USB microphone 17 is connected to the 
USB connector 1 6 of the entertainment apparatus main 
body 11 , and receives the voice input from the player. 
[0031] The constitution of the voice input operating 

50 system of this embodiment will be explained with refer- 
ence to the block diagram of Fig. 1 hereinafter. As shown 
in Fig. 1 , the voice Input operating system is constituted 
of a control section 201 , an input control section 202, a 
display control section 203, scenario data 301 , dynamic 

S5 image data 302 and reference voice data 303. 

[0032] The control section 201 has a game control 
function 2011 , a subtitles control function 3012, an input 
voice evaluating function 2013 and a dynamic image 
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control function 2014. The main CPU 100 mainly exe- 
cutes a program stored in the main memory 1 60, etc. so 
that the control section 201 is constructed on the main 
CPU 100, etc., to realize the respective functions. 
[0033] In the game control function 201 1 , the control 
section 201 performs processing for reading the scenar- 
io data 301 and advancing a game on the basis of a 
predetermined story. 

[0034] The above scenario data 301 are data read 
from the memory medium such as CD-ROM, 
DVD-ROM, etc., as required. For example, the scenario 
data 301 recorded data of a story development, subtitles 
data of words to be inputted by the player, and data of 
the response of a characterto an input of the player, etc. 
These data are managed with an index, etc., attached 
thereto, and are displayed and regenerated in conform- 
ity with the story development with using this index as 
a key. 

[0035] In the subtitles control function 301 2, the con- 
trol section 201 perfomris processing for displaying sub- 
titles recorded in the scenario data 301 in association 
with a scene in the story development, on a display unit 
through the display control section 203. These subtitles 
play a role as a guide for urging the player to the voice 
input of words. Characters to be voice-Inputted by the 
player at a certain time are displayed on the display unit 
by perfomning highlight processing, etc. (as in the guide 
display of singing words in "karaoke") so as to make the 
player understand contents of the characters. 
[0036] In the input voice evaluating function 2013, the 
control section 201 evaluates the voice data inputted by 
the player through a voice input device such as a micro- 
phone, etc. in comparison with a reference voice record- 
ed in the reference voice data 303. 
[0037] Specifically, a fundamental frequency (the 
height of a sound) is extracted from the voice inputted 
by the player with an FFT, etc. (this can be realized in 
software, andean be constmcted, e.g., within the control 
section 201) at predetemnined intervals such as one 
tenth second, and a sound volume (sound pressure) is 
measured. An element for gripping the height of the 
sound is not limited to the extraction of the fundamental 
frequency, and, for example, a second fonmant of a 
voice spectrum, etc., may be also extracted and 
gripped. 

[0038] In the voice inputted by the player, one phrase, 
i.e., words continuously inputted are to be used as one 
unit. This unit is displayed in one block in the subtitles, 
so that the player can recognize it. 
[0039] Fig. 2 is a graph showing one example of 
changes in the fundamental frequency and the sound 
volume when a word of "Kon-nlchiwa (Hello)" is inputted 
by voice. When a time Interval of the above word of the 
player is two seconds, the number of measuring points 
is 20, and the fundamental frequency and the sound vol- 
ume become twenty time series data. It is assumed that 
both the above fundamental frequency and the above 
sound volume are represented by values from 0 to 1 00, 
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that the fundamental frequency is converted to a relative 
amount with a first measuring point as 50, and that the 
sound volume is represented as an absolute amount. 
Naturally, these values are not required to be strict, and 

5 these values may be set to such an extent that the de- 
grees of changes in the fundamental frequency and the 
sound volume can be gripped. 
[0040] As can be seen from this figure, this system is 
arranged to grip the sound volume and the sound inter- 

10 val of the player voice are gripped, but not arranged to 
judge any pronunciation. For example, when the same 
time is taken to input "Koon-nichwa" and "Kon-nichiiwa" 
In the same sound volume change and the same into- 
nation change in time, the system grips these inputted 

15 voices as the same voices. Further, it is also similar 
when "Ah — " is inputted. 

[0041 ] Since no pronunciation is evaluated in this sys- 
tem as described above, a game can be executed with- 
out imposing any excessive burden on processing al- 

20 though the voice input is treated. Namely, in general, the 
sound interval and the sound volume are easily extract- 
ed and approximately real-time extraction can be made, 
so that no or little influence is almost exerted on the 
processing speed of the game. 

25 [0042] The reference votee data 303 records voice 
data as a reference of the evaluation of the word input- 
ted by the player, and has data converted from the 
change in the fundamental frequency and the change 
in the sound volume sampled at predetermined intervals 

30 nnentioned above. 

[0043] When the input voice evaluating function 201 3 
detects the extraction of the reference voice corre- 
sponding to the word to be inputted by the player from 
the reference voice data 303 and the start of voice input 

35 from the player, it calculates the difference in voice is 
cabulated and evaluates it every predetermined period 
with using a starting time point of the word as a refer- 
ence. 

[0044] For example, when the reference voice data 
40 303 of the word "Kon-nichwa" are shown by a broken 
line of the graph shown in Fig. 3 and input data of the 
player are shown by a solid line, this differences are pro- 
vided as shown in Fig. 4. 

[0045] In the input voice evaluating function 201 3, the 
45 input voice is evaluated on the basis of this difference 
at intervals of a predetermined period. 
[0046] The change in the fundamental frequency is 
gripped as a change in height of the word, i.e., intona- 
tion, and the evaluation based on the difference be- 
50 tween the voice of the player and the reference voice is 
reflected in a change in expression of a character. The 
change in the fundamental frequency is detennined to 
be a relative amount, since the difference in fundamen- 
tal height between individual voices is taken into ac- 
55 count. 

[0047] The change In the sound volume is gripped as 
an empathy degree, and the evaluation based on the 
difference (tension value) between the voice of the play- 
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er and the reference voice is reflected in a change in 
behavior of hands, feet, etc., of the character. 
[0048] In the graph shown in Fig. 4, the evaluation ref- 
erence is determined such that the tension increases in 
height as the value of the voice of the player in value as 
compared with the reference voice increases, and the 
tension decreases as the value of voice of the player as 
compared with the reference voice decreases. Further, 
the tension degree is arranged to Increase as plus and 
minus degrees are increased. 

[0049] In the example shown in Fig. 4, the tension is 
changed between the high tension and the low tension 
within the word in the sound volume evaluation, and the 
tension as a whole is high in the sound interval change. 
[0050] In the input voice evaluating function 2013, the 
above evaluation is carried out at intervals of a prede- 
termined period and is also carried out for each phrase. 
This evaluation is carried out as to how far the input 
voice as the entire word of "Kon-nichiwa" is separated 
from the reference voice. For example, in Fig. 4, the dis- 
tance (an absolute value of the difference between the 
player voice and the reference voice at each measuring 
point) from a line of +0 is calculated with respect to a 
value of the difference every predetermined period, and 
a total sum of this distance can be evaluated. In the eval- 
uation, as the absolute value decreases, the player 
voice is closer to the reference voice, so that a high eval- 
uation is given. 

[0051] Fig. 5 summarizes the evaluation in the above 
input voice evaluating function 2013 with using the 
sound volume evaluation as an example. For simplifica- 
tion, it is supposed that the evaluation is carried out at 
five measuring points. In this figure, the tension value 
as the differences between the player input voice and 
the reference voice in all the measuring points, i.e., at 
intervals of a predetermined period changes like -i-IO, 
±0, -10, -20, +10, and the evaluation of a phrase be- 
comes 50 as a total of the distances (absolute values). 
[0052] For example, there may be employed a consti- 
tution in which, when the tenmination of the player input 
voice and the tennination of the reference voice do not 
occur at the same time, and if one of the player input 
voice and the reference voice is terminated earlier, the 
evaluation at intervals of a predetermined period is ter- 
minated upon termination of one of them, and the 
phrase evaluation is arranged to give a bad evaluation 
upon termination of one of them on the assumption that 
that the speed of the word is not accurate and that the 
subsequent difference is a maximum value. 
[0053] In the dynamic image control function 2014, 
there is performed a processing for reading the dynamic 
image data 302 recording the operation of the character 
and reflecting evaluation results of the input voice eval- 
uating function 2013 in the operation of the character. 
[0054] The above dynamic image data 302 are data 
that are read from a recording medium such as 
CD-ROM, DVD-ROM, etc., as required, and the dynam- 
ic image data 302 record data of the operation of the 
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character in accordance with the story development. 
The data of the operation of the character recorded in 
the dynamic image data 302, particularly, the character 
of an operating object of the player are arranged such 
5 that movements for expressing a look, feeling, etc., e. 
g., the size of eyes, the opening degree of a mouth, the 
magnitude of a gesture, etc., can be changed by param- 
eters showing states. 

[0055] For example, when data of operation of the 

10 character that is to be "surprised" are recorded as the 
dynamic image data 302, parameters for the size of the 
eye, the opening degree of the mouth and the move- 
ment of a hand can be changed. The content of the pa- 
rameters pemnits adoption of one of three states of "ex- 

is aggeration", "usual" and "moderate". Fig. 6 shows ex- 
amples of a change in the operation of the character 
caused by a change in the parameters in this case. 
[0056] Fig. 6A shows a character operation when all 
of the size of the eye, the opening degree of the mouth 

20 and the movement of the hand are set to "exaggeration". 
Fig. 6B shows a character operation when all of the size 
of the eye, the opening degree of the mouth and the 
movement of the hand are set to "usual". Fig. 6C shows 
a character operation when all of the size of the eye, the 

2S opening degree of the mouth and the movement of the 
hand are set to "moderate". Fig. 6D shows a character 
operation when the size of the eye and the opening de- 
gree of the mouth are set to "usual" and the movement 
of the hand is set to "exaggeration". 

30 [0057] The operation of the character can be thus 
changed on the basis of a combination of the parame- 
ters in the dynamic Image control function 2014. 
[0058] The input control section 202 performs the 
control of an input voice signal from a microphone con- 

35 nected as an input device, etc. 

[0059] The display control section 203 is constructed 
on the GP 110 in accordance with commands of the 
main CPU 1 00, etc., and generates display screen data 
on the basis of screen data in which image data received 

40 from the control section 201 are transferred from a game 
processing section 802. The generated display screen 
data are outputted to a display unit, and the display unit 
receiving these displays an image on the display screen 
according to the display screen data. 

45 [0060] The operation of the entertainment apparatus 
10 in this embodiment will be explained below. 
[0061] When a game Is started, the control section 
201 reads scenario data 301 , and regenerates dynamic 
image data 302 associated with the scenario. When a 

50 scene appears in whk;h a character in charge of a player 
says words, subtitles of the words is displayed to urge 
the player to input a voice. 

[0062] Fig. 7 is a flow chart for explaining a processing 
flow in the above case. 
S5 [0063] First, as described above, the control section 
201 causes the display unit to display words to be input- 
ted by the player through the display control section 203 
(S101). 
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[0064] The control section 201 then highlights char- 
acters to be read with respect to these subtitles, to urge 
the player to input a voice (S1 02). 
[0065] The player is to input words in conformity with 
this highlight display. 

[0066] Information on the input of the words may be 
also displayed in conformity with these subtitles. For ex- 
ample, when a scene is an impact-giving scene and a 
reference voice is recorded in exaggeration, the guide 
of an expression of "exaggeration" is displayed to im- 
pose the input of exaggerated words on the player. At 
this time, the player inputs the words in accordance with 
the guide of the expression, to obtain high evaluation. 
[0067] When the player starts to input the words within 
predetermined time periods before and after a time point 
of the highlighted display of a first character of the sub- 
titles, e.g., in a range within one second or two seconds, 
the control section 201 treats this input of the words as 
a valid input (S103). When the words input is started 
before or after this range, the control section 201 treats 
the words input as an invalid input, and reduces the eval- 
uation with respect to the input (SI 04). 
[0068] When the input of the words is started within 
the above valid period, and when a starting time point 
of the input of the words comes eariler than the time 
point of the highlighted display of the first character of 
the subtitles, the control section 201 regenerates the dy- 
namic image data 302 associated with the scenario re- 
lated to the input of the words at a decreased regener- 
ating speed of the dynamic image data 302 (S106). In 
contrast, when the starting time point of the input of the 
words comes later than the time point of the highlighted 
display of the first character of the subtitles, the control 
section 201 regenerates the dynamic image data 302 
associated with the scenario related to the input of the 
words at an increased regenerating speed of the dy- 
namic image data 302 (S107). The degrees of increas- 
ing and decreasing the regenerating speed are propor- 
tional to the difference between the time point of the 
highlighted display of the first character of the subtitles 
and the starting time point of the input of the words. 
[0069] Namely, even when the word input starting 
time point of the player is shifted from the starting time 
point of the subtitles, the control section 201 adjusts ter- 
mination timing of these words and tenninatlon timing 
of the operation of the character with respect to these 
words such that these timings agree with each other 
[0070] The control section 201 carries out the above 
evaluation with respect to the voice input of the player 
at intervals of a predetermined period, e.g., at intervals 
of one tenth second (S108). The control section 201 
then instantly adjusts the operation of the character on 
the basis of this evaluation, and reflects the evaluation 
in a picture image (S109). The control section 201 re- 
peats this operation until the voice input of the player is 
terminated (S110). 

[0071] The above processing will be explained. 
[0072] The control section 201 calculates differences 
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in sound volume and sound interval between the player 
voice and the reference voice at intervals of a predeter- 
mined period as described above. The control section 
201 uses values of these differences as a sound volume 
5 tension value and a sound interval tension value, re- 
spectively. 

[0073] In the dynamic image data 302, the tension val- 
ues and parameters of the operation of the character 
are associated with each other. For example, when the 

10 operation of the character that is "surprised" is per- 
formed, the sound volume tension value is associated 
with the movement of hands of the character. When the 
sound volume tension value is smaller than -25, "mod- 
erate" is set. When the sound volume tension value is 

15 -25 or more and is smaller than +25, "usual" is set When 
the sound volume tension value is +25 or more, "exag- 
geration" is set. 

[0074] Further, the sound Interval tension value is as- 
sociated with the size of the eyes and the opening de- 

20 gree of the mouth of the character. When the sound in- 
terval tension value is smaller than -25, "moderate" is 
set. When the sound interval tension value is -25 or 
more and is smaller than +25, "usual" is set. When the 
sound interval tension value is +25 or more, "exagger- 

25 ation" is set. 

[0075] The control section 201 calculates the sound 
volume tension value and the sound interval tension val- 
ue at intervals of a predetermined period, and deter- 
mines parameter contents of the operation on the basis 

30 of these values. For example, when the sound volume 
tension is +30 and when the sound inten/al tension is 
+1 0, the movement of the hands Is "exaggeration", and 
the size of the eye and the opening degree of the mouth 
are "usual". 

35 [0076] The control section 201 generates an image 
corresponding to these parameter contents, and causes 
the display unit to display the image through the display 
control section 203. 

[0077] The above processing is perfomned at intervals 

40 of a predetermined period until the input of words by the 
player Is terminated, whereby the character can be 
caused to perfonn an operation that is a real-time reac- 
tion to the input by the player. The words to be inputted 
by the player are taken in the unit of one phrase. There 

45 is therefore employed a constitution in which, when 
there is no voice from the player for a predetermined 
period of time, e.g., 0.5 second, after the start of input 
of the voice is detected, it is judged that the input of 
words is terminated. 

50 [0078] When the input of words by the player is temii- 
nated, the control section 103 evaluates the entire 
words as described above (S1 11). This evaluation Is an 
evaluation showing how close to the fundamental voice 
the voice can be inputted. For example, there is em- 

55 ployed a constitution in which these evaluations are ac- 
cumulated through a certain story, and as a result, when 
a certain evaluation cannot be obtained, the story can- 
not proceed to a next story so that game property is im- 



EP 1 201 277 A2 



6 



11 



EP 1 201 277 A2 



12 



proved. 

[0079] The invention is not limited to tlie above em- 
bodiment modes, but can be variously modified within 
the scope of features of the present invention. 
[0080] For example, In the above example, the behav- 
ior of the character is determined by associating the ten- 
sion values with the contents ("exaggeration", "usual" 
and "moderate") of the parameters. However, the ten- 
sion values and the behavior of the character may be 
directly associated with each other (e.g., the size of the 
eyes is classified into 0 to 100), and the tension values 
may be also used as parameters as they are. 
[0081] Further, the appearance and the hardware 
construction of the entertainment apparatus 1 0 are not 
limited to those shown in Figs. 8 and 9. For example, 
the entertainment apparatus 1 0 may have the constitu- 
tion of a general electronic computer including a CPU, 
a memory, an external memory device such as a hard 
disk unit, a reader for reading data from a memory me- 
dium having portability such as CD-ROM, DVD-ROM, 
etc., input devices such as a keyboard, a mouse, a mi- 
crophone, etc., a display unit such as a display, etc., a 
data communication device for performing communica- 
tion through a network such as the Internet, etc., and an 
interface fortransmitting and receiving data between the 
above respective devices. This case may employ a con- 
stitution in which the program and various kinds of data 
for constructing the constitution shown in Fig. 1 onto the 
entertainment apparatus 1 0 are read from the memory 
medium having the portability through the reader, and 
are stored in the memory or the extemal storage device. 
Otherwise, the program and these data may be down- 
loaded from the network through a data communication 
device, to be stored in the memory or the extemal mem- 
ory device. 

[0082] As described above, in accordance with the 
present invention, it is possible to realize a game in 
which the character makes real-time reactions to inputs 
of voices. 



Claims 

1. An entertainment apparatus with which a voice in- 
put device for receiving a voice input from a player 
is usable, the entertainment apparatus comprising 

character control means for controlling the op- 
eration of a game character; 
sound interval extracting means for extracting 
information of a relative sound inten^al from the 
voice of the player received through said voice 
input device; and 

sound volume extracting means for extracting 
information of a sound volume from the voice 
of the player received through said voice input 
device; 



wherein said character control means evalu- 
ates said extracted information of the relative sound 
interval and makes the character perform an oper- 
ation according to a result of the evaluation. 

5 

2. The entertainment apparatus according to claim 1 , 
whk:h further comprises; 

guide display means for indicating contents of 
10 the voice to be inputted by the player. 

3. The entertainment apparatus according to claim 2, 
which further comprises; 

reference voice data storage means for storing 
voice data as an evaluation reference about the 
relative sound interval and the sound volume 
with respect to the voice to be inputted by the 
player, wherein; 

said character control means periodically com- 
pares said extracted information of the relative 
sound interval and said extracted information 
of the sound volume with the voice data as said 
evaluation reference, and determines opera- 
tion contents of the character on the basis of 
results of the comparison. 

4. The entertainment apparatus according to claim 2, 
which further comprises; 

expression mode display means for indicating 
an expression mode of the voice to be inputted by 
the player. 

5. The entertainment apparatus according to claim 3, 
wherein 

the operation of said character is shown by re- 
generating image data prepared in advance, 
and 

said character control means changes a regen- 
erating speed of said image data on the basis 
of the difference between timing for indicating 
contents of the voice to be inputted by said 
player and timing for starting the input of the 
voice by the player. 

6. The entertainment apparatus according to claim 3, 
wherein 

said character control means compares said 
extracted infomiation of the relative sound in- 
terval and the voice data of the relative sound 
interval as said evaluation reference, and, as a 
result of the comparison, 
said character control means exaggerates an 
expression of the character as the extracted rel- 
ative sound interval is higher than the relative 
sound interval as the evaluation reference, and 
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moderates the expression of the character as 
the extracted relative sound interval is lower 
than the relative sound interval as the evalua- 
tion reference. 

7. The entertainment apparatus according to claim 3, 
wherein 

said character control means compares said 
extracted information of the sound volume and the 
voice data of the sound volume as said evaluation 
reference, and as a result of this comparison, said 
control means exaggerates a behavior of the char- 
acter as the extracted sound volume is larger than 
the sound volume as the evaluation reference, and 
moderates the behavior of the character as the ex- 
tracted sound volume is smallerthan the sound vol- 
ume as the evaluation reference. 

8. A method for controlling the operation of a character 
in a game executed by an entertainment apparatus, 
comprising: 

extracting information of a relative sound inter- 
val and information of a sound volume from 
voice data of a player upon receipt of a voice 

input of the player, and 

changing the operation of the character on the 
basis of said extracted infonnation of the rela- 
tive sound interval and said extracted infonma- 
tion of the sound volume. 

9. The method for controlling the operation of a char- 
acter as recited in claim 8, wherein 

contents of the voice to be inputted by the 
player are displayed before the reception of the 
voice Input of the player. 

10. The method for controlling the operation of a char- 
acter as recited in claim 9, wherein 

said extracted information of the relative 
sound interval and said extracted Infonmation of the 
sound volume are periodically compared with the 
voice data as an evaluation reference with respect 
to the relative sound interval and the sound volume 
prepared in advance, and the change in the opera- 
tion of said character Is detemnined on the basis of 
a result of the comparison. 

11. The method for controlling the operation of a char- 
acter as recited in claim 9, wherein 

an expression mode of the voice to be input- 
ted by the player is displayed together with the con- 
tents of the voice to be inputted by said player be- 
fore the reception of the voice input of the player. 

12. The method for controlling the operation of a char- 
acter as recited in claim 10, wherein 
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the operation of said character is shown by re- 
generating image data prepared in advance, 
and 

a regenerating speed of said image data is 
5 changed on the basis of the difference between 

timing for outputting the contents of the voice 
to be inputted by said player, and timing for 
starting the input of the voice by the player. 

10 13. The method for controlling the operation of a char- 
acter as recited In claim 1 0, wherein said extracted 
information of the relative sound interval and the 
voice data of the relative sound interval as said eval- 
uation reference are compared, and as a result, an 

15 expression of the character is exaggerated as the 
extracted relative sound interval is higher than the 
relative sound Interval as the evaluation reference, 
and the expression of the character is set to be mod- 
erate as the extracted relative sound Interval is low- 

20 er than the relative sound interval as the evaluation 
reference. 

14. The method for controlling the operation of a char- 
acter as recited In claim 10, wherein said extracted 

25 Information of the sound volume and the voice data 
of the sound volume as said evaluation reference 
are compared, and as a result, a behavior of the 
character is exaggerated as the extracted sound 
volume is larger than the sound volume as the eval- 

30 uation reference, and the behavior of the character 
is moderated as the extracted sound volume is 
smaller than the sound volume as the evaluation 
reference. 

35 15. A storage medium having a program recorded 
therein, said program executable in an entertain- 
ment apparatus to be usable with a voice input de- 
vice for receiving a voice input from a player, 

wherein said program causes the entertain- 

40 rnent apparatus to perfomn the steps of: 

sound interval extracting processing for ex- 
tracting information of a relative sound interval 
from the voice of the player received through 
45 said voice input device; 

sound volume extracting processing for ex- 
tracting information of a sound volume from the 
voice of the player received through said voice 
input device; and 
50 character control processing for evaluating said 

extracted Infonnation of the relative sound in- 
terval and said extracted information of the 
sound volume, and making the character per- 
form an operation according to a result of the 
55 evaluation. 

16. The storage medium according to claim 15, wherein 
said program causes the entertainment appa- 
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ratus further to perform guide display processing for 
indicating contents of the voice to be inputted by the 
player. 

1 7. The storage medium according to claim 1 6, wherein 

said program causes the entertainment appa- 
ratus further to perfonn processing for referring 
to reference voice data for storing voice data 
as an evaluation reference about the relative 
sound interval and the sound volume with re- 
spect to the voice to be inputted by the player, 
and 

in said character control processing, said ex- 
tracted infomnation of the relative sound inter- 
val and said extracted infomriation of the sound 
volume are periodically compared with the 
voice data as said evaluation reference, and re- 
sults of the comparison determine operation 
contents of the character. 

1 8. The storage medium according to claim 1 6, wherein 

said program causes the entertainment appa- 
ratus further to perfonn expression mode display 
processing for indicating an expression mode of the 
voice to be inputted by the player 

1 9. The storage medium according to claim 1 7, wherein 

the operation of said character is shown by re- 
generating Image data prepared in advance, 
and 

said character control processing includes 
changing a regenerating speed of said image 
data on the basis of the difference between tim- 
ing for indicating contents of the voice to be in- 
putted by said player and timing for starting the 
input of the voice by the player. 



sound volume is larger than the sound volume as 
the evaluation reference, and moderating the be- 
havior of the character as the extracted sound vol- 
ume is smaller than the sound volume as the eval- 
5 uation reference. 

22. A program executable in an entertainment appara- 
tus to be usable with a voice input device for receiv- 
ing a voice input from a player, 
10 wherein said program causes the entertain- 

ment apparatus to perform the steps of: 

sound inten/al extracting processing for ex- 
tracting information of a relative sound interval 

IS from the voice of the player received through 

said voice input device; 
sound volume extracting processing for ex- 
tracting information of a sound volume from the 
voice of the player received through said voice 

20 input device; and 

character control processing for evaluating said 
extracted infomiation of the relative sound In- 
terval and said extracted information of the 
sound volume, and making the character per- 

25 form an operation according to a result of the 

evaluation. 



30 



35 



20. The storage medium according to claim 1 7, wherein 
said character control processing Includes 
comparing said extracted information of the relative 
sound interval and the voice data of the relative 
sound interval as said evaluation reference, and as 
a result, exaggerating an expression of the charac- 45 
ter as the extracted relative sound interval is higher 
than the relative sound interval as the evaluation 
reference, and moderating the expression of the 
character as the extracted relative sound interval is 
lower than the relative sound interval as the evalu- so 
ation reference. 



21 . The storage medium according to claim 1 7, wherein 
said character control processing includes 
comparing said extracted information of the sound ss 
volume and the voice data of the sound volume as 
said evaluation reference, and as a result, exagger- 
ating a behavior of the character as the extracted 
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